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Abstract 


Existing circuit models for short-channel MOS transistors represent a compromise 
between computation speed and ease of use. Empirical models are very fast to 
evaluate, but their parameters must be fitted from experimental measurements. 
Theoretical models require longer’ computation time, but they may be used to predict 
the performance of new, unmeasured MOS technologies since their parameters are not 
curve-fitted from experimental data. 


This thesis combines the best features of both types of model, yielding a fast circuit 
simulator whose input parameters need not be extracted from experimental 
measurements. A nonlinear optimization algorithm is used to "compile" the parameters 
of a theoretical model into parameters for an empirical model, providing the superior 
user-interface of theoretical models without sacrificing simulator execution speed. 
Results produced by a prototype model compiler are presented, showing the modeling 
error to be approximately 5 percent. 
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Chapter 1: introduction 


One of the major challenges in integrated circuit engineering is to minimize the 
number of design iterations required to fully debug a chip. Error tracing is made 
difficult by the small size of the chip, since micro-probes or electron microscopes are 
required to monitor the behavior of internal nodes. When errors are finally located and 
corrections are proposed, they are both expensive and time-consuming to implement: 
the entire chip must be refabricated, even if the modification affects only one 
transistor. This process may cost up to $20,000 and require 6 weeks to complete. 

The best way to avoid design revisions is to debug a chip before it is ever 
fabricated; usually this is done by computer simulation. Simulation makes circuit 
behavior easy to examine, so malfunctions can be rapidly located and _ repaired. 
Proposed modifications are then simulated to see if they remove the error. An iterative 
process of redesign and re-simulation is performed until the simulated circuit behavior 
is acceptable. 

This thesis is concerned with the specific task of MOS circuit simulation; that is, 
accurately predicting the analog waveforms produced in a network of MOS transistors 
under specified input excitations. Simulation accuracy is impossible unless the 
nonlinear characteristics of the individual transistors are accurately modeled. Providing 
acceptable modeling accuracy at low (simulation) cost is the goal of this work. 

High performance is obtained in modern MOS technologies by scaling down the 
transistor dimensions [11, 29]. Unfortunately, as they become smaller, transistors 
deviate significantly from the behavior predicted by classical gradual-field theories. 
These "short channel effects" drastically complicate the physical theory of transistor 
behavior, and hence increase the complexity of transistor models. Two distinct 
approaches to short channel transistor modeling have been successfully employed in 
circuit simulators: theoretical models (e.g., [13]) and empirical models (e.g., [19]). The 


terms "theoretical" and "empirical" are new; | use them to call attention to the 


structural difference between two general classes of MOS models. 

Theoretical models are created by performing a complete _ solid-state 
electrodynamic analysis of the MOS transistor structure. Each identifiable physical 
effect is explicitly accounted for, in an attempt to eliminate all possible sources of 
error. The resulting equations are then directly used as a model for circuit simulation. 

The parameters of a theoretical model are simply the physical attributes of a 
particular MOS technology (e.g., oxide thickness, substrate doping, junction depth, 
etc.}. These constants are carefully measured and closely controlled in the 
semiconductor fabrication process. Because the model parameters are so accurately 
known, theoretical models are extremely easy to use: physical constants of the 
technology are plugged directly into the circuit simulator. 

New MOS processes are also readily simulated with theoretical models, by inserting 
target values of the physical fabrication parameters into the simulator. Circuits can be 
accurately simulated even before the first wafer has been processed, allowing chip 
design and technology development to proceed in parallel. The price paid for this 
extreme flexibility is slow simulation. Each separate physical effect is modeled 
individually; some effects require transcendental function calculations or nested 
iterations. When all of these calculations are combined, the final modeling routine 
becomes extremely complex, requiring a very large number of computational operations 
to predict transistor behavior. The high cost of theoretical models has led to the 
development of an alternative strategy, which | will refer to as “empirical” modeling. 

Empirical models are not rooted in solid-state physics; rather, they are 
mathematical expressions which provide a good curve-fit to the behavior of MOS 
transistors. Whereas theoretical models strive for accuracy at any cost, empirical 
models strive for acceptable accuracy at very low cost. Since individual transistors on 


the same physical wafer are subject to parameter variations, modeling accuracy beyond 


a certain level is illusory. 

Instead of separately accounting for each individual physical effect, empirical 
models often lump several effects together, and they occasionally ignore some effects 
entirely. Accuracy is thus traded against model evaluation speed, accepting 
engineering approximations in. return for fast simulation. As R. C. Foss has said [19], 
"A +230percent representation of a 10 percent effect gives 3 percent accuracy." 

Since their equations are not constrained to be physically motivated, empirical 
models can employ any convenient mathematical expressions which give reasonable 
accuracy. Simple polynomials are often used, because they can provide acceptable 
curve-fits while permitting fast evaluation. The coefficients of these polynomials are the 
"fudge factors" which are adjusted to obtain a good fit. Because the "fudge factors" 
are not physical constants of the fabrication process, their values are unpredictable, 
and must be extracted from measurements made on actual transistors. New, as-yet- 
unfabricated technologies cannot be simulated with empirical models, because no test 
transistors exist for parameter extraction. 

A hybrid modeling scheme which combines the flexibility of theoretical models with 
the fast simulation speed of empirical models is presented in this thesis. Model 
parameters are (easily obtainable) physical fabrication constants, yet high speed 
empirical equations are implemented in the circuit simulator. This is achieved by 
"compiling" the physical constants for a theoretical model into the parameters 
necessary to drive an empirical model which is used by the circuit simulator. 

The model parameter compilation process is analogous to a high-level computer 
language compiler. A slow translation program is invoked once, generating a fast 
object program which is subsequently invoked many times. In the MOS modeling 
domain, the “object program" is a set of parameters which will drive a fast empirical 
model (permitting fast circuit simulation). The "translation program" uses the input 
theoretical model parameters (the "source program") to simulate several test 


transistors, mimicking the measurements one would make to perform a manual 


parameter extraction. These measurements are fed into a multiple-parameter nonlinear 
curve fitting program, which generates optimal (best-fit) parameters to drive the 
empirical model. 

In some eased it might be possible to analytically compute the empirical model 
parameters, given the theoretical model parameters (e.g., when the empirical model is a 
subset of the theoretical model). However, a more general solution is desirable, which 
would permit compiling betweén any two arbitrarily selected models. Parameters could 
then be compiled for several different empirical models, to evaluate which one gives 
the best tradeoff between speed and accuracy. To allow these sorts of experiments, a 
curve-fitting program was constructed. 

Of course, transistor modeling is only one part of the work performed by circuit 
simulators, so it is possible that a drastic speedup in transistor modeling might result in 
only slightly improved total simulation speed. Figure 1.1 shows the total time required 
to simulate the same circuit using two different MOS transistor models. One model 
(Berkeley) is theoretical, and the other model (Mosaid) is empirical. Data for this figure 
is taken from the SPICE 2G.5 simulator, running on the VAX! 11/780 computer under 
the V7 UNIX? operating system. For this particular circuit, simulation time was reduced 
by 35%, simply by switching to an empirical model. 

The present modeling effort is concentrated on the steady-state (DC) current- 
voltage characteristics of MOS transistors. Although capacitances and their model 
parameters have been omitted for simplicity, there is no fundamental limitation which 
prevents the compilation technique from extracting them. Indeed, since most 
capacitance parameters do not interact with DC parameters, two separate compilations 
could be employed, thus reducing the dimensionality of the parameter-space and 


VAX is a registered trademark of Digital Equipment Corporation. 


UNIX is a registered trademark of Bell Telephone Laboratories. 
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Figure 1.1. Circuit simulation time for different MOS models 


improving compilation speed. 

The present effort emphasizes the compilation aspects of transistor modeling; old, 
well-known models were used in the compilation experiments, rather than developing 
entirely new models. However, the technique presented here is a general one, allowing 
parameters for virtually any MOS model to be compiled into parameters for virtually any 
other model. (For reasons of time, this work concentrated on the Berkeley and Mosaid 
models). The same method could be applied, for example, to generate table entries for 
a totally table-driven empirical model [41], or to compute optimal coefficients for a 
cubic-spline interpolation scheme. 

The use of numerical minimization for parameter extraction has applications 
besides the proposed compilation strategy. If an MOS technology has been in 
existence for some time, actual test transistors are available for measurement, and this 
data can be fed to the minimization routine. Optimal-fit parameters can then be 


extracted directly from the physical data [7, 39]. However, this technique does not 
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permit the designer to easily perturb the simulation (e.g., to see the effects of over- 
etched polysilicon), because the empirical model parameters are not directly related to 
physical quantities. 

This thesis is divided into five chapters. Chapter 1 discusses the idea of using a 
two-level modeling technique,.and proposes the technique of constructing a parameter 
compiler. MOS models are the topic in chapter 2. The distinctions between theoretical 
and empirical models are discussed, and the two example models used in the 
parameter compilation experiments are described. Chapter 3 presents numerical 
algorithms for nonlinear minimization; they are subsequently used in the parameter 
compilation programs. Chapter 4 discusses the process of model parameter extraction. 
Manual extraction techniques are presented, and then fully automated procedures are 
described. Results obtained from prototype versions of model parameter compilers are 
presented, and the computational benefits of the compilation strategy are outlined. 
Chapter 5 presents some concluding remarks, along with suggestions for extending and 


improving this work. 
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Chapter 2: DC Models for MOS Transistors 


2.1: Introduction 


This chapter explores the problem of constructing a circuit model of the MOS 
field-effect transistor. The model is nothing more than a set of equations which 
predicts the device’s current-voltage behavior. A familiar example of a model is Ohm's 
Law, which states that the current through a resistor is linearly proportional to the 
voltage across its terminals: 


A simple, intuitive model of an idealized transistor is presented, to show the 
general form of model equations and to exhibit typical |-V characteristics. Model 
refinements are then introduced, which attempt to account for the non-ideal behavior 
observed in real transistors. As refinements are added, the complexity of model 
equations is increased, and the distinction between theoretical and empirical models 
becomes clearer. The equations for a typical theoretical model and for a typical 
empirical model are presented; they are subsequently used in the parameter 


compilation experiments described in chapter 4. 
2.2: The MOS Field-Effect Transistor 


A cutaway view of an n-channel MOS transistor is shown in Figure 2.1. The 
transistor has four terminals, called the Drain, Source, Gate, and Bulk (or substrate). 
The drain and source terminals are n-type diffused regions, while the bulk is p-type 


silicon; these pn junction diodes are reverse biased under normal operating conditions. 
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bulk (p-type Silicon) 


Figure 2.1 MOS Field Effect Transistor 


The gate terminal is typically made of polycrystalline silicon, and is separated from 
the rest of the device structure by a thin insulating layer of silicon dioxide’. No DC 
current can flow into the gate terminal because of this insulator, so (unlike the bipolar 
junction transistor) the MOS transistor is a voltage-controlled device, as shown in the 
circuit model of Figure 2.2. Although the drain and source terminals are symmetric, a 


labeling convention is adopted such that Vos > 0. 


Current-voltage characteristics of MOS transistors can be (roughly) divided into 
three separate regions of operation, labeled in Figure 2.3 as the Triode, Saturation, and 


Cutoff regions. Transistors operating in the cutoff region have zero drain current Ingi 


current begins to flow only when the gate-to-source voltage exceeds a threshold value 


called V;. N-channel devices with Vy 20 volts are often called "enhancement" 


transistors, while transistors with thresholds below O volts are called "depletion" 


Gate dielectric materials other than silicon dioxide are occasionally used in exotic 


devices. 
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Ids = f(Vgs, Vds, Vbs) 
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Figure 2.2 Circuit model of MOS transistor 
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Figure 2.3 Typical current - voltage characteristics 


devices. 


In the triode region, drain current is an increasing function of drain voltage Vos: 


However, as drain voltage is increased, the current eventually levels off ("saturates") 


and the transistor enters the saturation region. The transition from triode to saturation 
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occurs at a value of drain voltage known as V Figure 2.3 shows the locus of 


Dsat’ 


Vp ear values. 


The n-channel transistor depicted in Figure 2.1 is built on a p-type substrate; an 
n-type substrate (with p-type source and drain) can also be used, resulting in a p- 
channel transistor. P-channel devices operate exactly the same way that n-channel 
transistors do, except that the algebraic signs of all voltages and currents are reversed. 


Thus a p-channel, enhancement transistor has Vr <0 volts, while a_p-channel 


depletion device has Vr > 0. 


2.3: A Simple First-Order Model 


A very simple model of MOS transistor operation is developed in this section. The 
discussion will focus on n-channel devices, although p-channel transistors are readily 
modeled by a change of algebraic sign. Results arising from solid-state physics will be 
referenced, but used without derivation, in order to treat the transistor as a circuit-level 
lumped element. Many solid-state electrodynamic analyses of the MOS transistor are 
available in the literature [13, 21, 32, 47]; the interested reader is referred to them for 
more detail. 


When a positive gate-to-bulk bias Vag is applied to the MOS transistor, the 


induced electric field across the gate oxide attracts negatively-charged carriers 
(electrons) to the silicon dioxide surface, as shown in Figure 2.4. These electrons 
ionize the holes present at the surface, and begin to deplete the p-type bulk of 
majority carriers. If the bias voltage is made sufficiently large, the number of electrons 
at the surface exceeds the number of holes, and the surface is said to be inverted. So 
many electrons are available that the surface is effectively n-type, not p-type. The 
inversion layer at the surface forms an n-type "channel" from the (n-type) drain to the 


(n-type) source, allowing current to flow. 
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+Vegb 


p-type Bulk 


Figure 2.4 Field induced channel 


If the source is shorted to the bulk (Ves = VaR): and the drain-to-source voltage 


is kept very small, the total charge Q in the channel region is given by 


Qo = 


€ 
—— (Vag - Vp) WL (2.3.1) 
Ox 


Channel current is related to channel charge through the transit time Ue 


This discussion is after Muller and Kamins [33], pp. 350-354. 
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Transit time is simply the channel length divided by the drift velocity 


2 
ue Se a eee See (2.3.3) 
“HAE B»Yps 
The channel current is therefore 
i W fox Wy o_vyy (2.3.4) 
pe hie Te es” T) “ps 3. 


If the drain-to-source voltage is now increased, the channel potential is not 
constant, but increases from source to drain. This effect can be approximated by 


considering the entire channel to be at its “average” potential (Veg- Vpg/2)). This 


leads to a new channel charge Qo 


Ox 
Qo = = Weg - Vy -Ypg/2) WL (2.3.5) 
Ox 
and a new drain current 
/ W fox Wy yey /2)V (2.3.6) 
ps “nt 7 “es ~ T~%ps/) Yps ao: 


At constant Ves: equation (2.3.6) predicts the drain current to be a parabolic 


function of drain voltage; see Figure 2.5. The maximum of the parabola occurs at 


Vos = (Vag-V7). Beyond this drain voltage, equation (2.3.6) predicts unreasonable 


behavior, since the incremental conductance (slope) is negative. In fact, the drain 


current saturates at this maximum value, becoming independent of Vos due to a 


physical mechanism known as channel pinchoff [33]. 
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Figure 2.5 Idealized current - voltage characteristics 


Equation (2.3.6) is valid in the triode region (Vis < Vosat the saturation voltage 
is given: by 


Vv (2.3.7) 


Dsat * “es ~“r 


At drain voltages above V the transistor enters the saturation region, where the 


Dsat’ 


drain current is independent of Vos! 


(2.3.8) 


The constants in the preceding equations (e.g., Bhs Egy: Tae) are physical 


-18- 


parameters of the fabricated transistors. 
An equivalent formulation would replace the leading term of (2.3.6) with a single 


constant "K", yielding the familiar Shichman-Hodges model [40]: 


0 (a) if Vpca € 0 
fo tay. eM ie a yo Vie TB) EVA SV (2.3.9) 
ps) = eae Wee Vp anes) “ps DS = psat 3. 
K W 2 (c) ifVng > Vosat 
27 “as ~ Yr) 


Equation (2.3.9) models all three regions of operation: (a) cutoff, (b) triode, and (c) 


saturation. 
2.4: Sources of Error 


In the first-order model of the preceding section, the source was assumed to be 


shorted to the bulk, and Vy was independent of bias voltage. Measurements on actual 
transistors show that V> varies with the source-to-bulk voltage Vopi see Figure 2.6. 


This phenomenon, known as the "body effect", is incorporated into the Shichman- 
Hodges model by introducing the empirical parameters Vro gp, and y: 

172 
B 


Vr =Vrq + ¥ [Up +V¥g_)'? - (9)? J (2.4.1) 


A more serious source of error in the Shichman-Hodges model arises from the 
assumption that the channel voltage can be approximated by its average value 


(Ves - (Vp9 72). This approximation can be removed by writing an expression for the 


voltage drop across an infinitesimal length dy of the channel, at a distance y from the 


= 49% 


Vt 


v0 


Vsb 
Figure 2.6 The body effect 
source terminal. Manipulation yields 
Ing AV = “ty W QG(VE) AV (2.4.2) 
which can be integrated to give the drain current Ing [33]: 
WC AY V Vine/2)Vne - 
Hy 7 £ Soy (Yas - “eg - Yos/?) Vos 
! (2.4.3) 


Ds > 
2 1/2 3/2 3/2 
3 ( 2,0, ) [( 29, +Vns +Vog ) = (29, +Vop) } 


Equation 2.4.3 is valid in the triode region; current is assumed to be constant in 


the saturation region. The transition voltage is given by 
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2 

€.qN 2C 
= 2 fp ete ee, 2 1/2 _ 
Vosat = Yas ~ “re aili+ za, (Veg - Yep + gp_) 7" - 13 (2.4.4) 


Ox 


Notice that equation 2.4.3 explicitly contains terms for the "body effect" mentioned 
above, with coefficients that are physical constants of the fabrication process. It is a 
refinement of the simple theoretical mode! developed in section 2.3. 

By an appeal to channel pinchoff it was assumed that the incremental output 


conductance (d Ing 7 Vos) in the saturation region is zero. Actual transistors exhibit 


finite output conductance, which increases with increasing gate voltage. This effect, 
often called “channel length modulation", is shown in Figure 2.7. The empirical 
Shichman-Hodges model accounts for channel length modulation by including an 


output conductance parameter Ves similar to the Early voltage of bipolar transistors: 


Vos = Vosat ] 


Ve 


(gg -V7)* [1 + (2.4.5) 
Theoretical models have also been developed to account for finite output conductance 


[20]; one such model will be shown in section 2.6. 
2.5: Second Order Effects 


As MOS fabrication technology improves, it becomes possible to build smaller 
transistors. When a transistor’s dimensions are decreased, assumptions made in the 
derivation of the model equations (e.g., that threshold voltage is independent of device 
size) become less valid. This section discusses several examples of such second-order 
effects. 

Transistors with short channels (L < 94m) or narrow channels (W < 9m) have 


different threshold voltages than wide, long devices [1, 2, 8, 10, 48]; see Figure 2.8. 
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Ids 


Vds 


Figure 2.7 Channel length modulation 


Short-channel thresholds are lower because the depletion regions around the source 
and drain partially ionize the channel. As channel length is decreased, this additional 
ionization becomes an appreciable fraction of the total channel charge. Narrow 
channel transistors, on the other hand, have higher thresholds than wide devices [4]; 
this results from field implants diffusing into the channel region and raising the average 
bulk doping density [45]. 

Another effect that becomes more pronounced in short channel devices is the 
nonlinear velocity-field relationship. For low electric fields, equation (2.3.3) is 
applicable, and carrier velocity is proportional to the transverse electric field 


(E = -Vpg7L). However, at high fields, velocity becomes limited by carrier 


interactions with lattice phonons and eventually reaches a maximum value. This effect 
limits the drain current (and hence the gain) of short channel transistors, as shown in 


Figure 2.9. Two transistors with identical (W/L) ratios are plotted; one has L = 20 


-22. 


W,L 
(microns) 


Figure 2.8 Narrow- and short-channel effects 


microns (solid curves), and the other has L = 5 microns (broken curves). 
In the derivation of equation (2.4.3), it was assumed that surface mobility is 
constant. Experimentally, mobility is found to be a function of the vertical electric field 


across the gate oxide. As gate voltage Ves rises, mobility decreases, so the final 


current of an actual transistor will be less than that predicted by (2.4.3). Two different 


models for this effect will be presented in the following sections. 
2.6: Berkeley Model 


The example theoretical model, used for the parameter compilation experiments in 
chapter 4, is presented in this section. It was developed at the University of California 
at Berkeley, and is installed as the built-in model in the SPICE circuit simulator [44]. 


The symbol Cox is used to represent the gate oxide capacitance per unit area 


(= €ox/T ox): 
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Figure 2.9 Velocity saturation 


Threshold Voltage: 
= 1/2 
Vr = Vein + ¥g (2% +V cp) 


Nss 


+29, +(n- 1)(2p¢ +Vog) 


(Body effect coefficient): 


Ys = = (2qe, NSUB)'/? (1 - ag-ap) 
Ox 
_ Xd S\1/2 
as = 5 [(1 +25) 1] 
Ww 
- Xd D 1/2 
an = or [qa +257) 1] 


(2.6.1) 


(2.6.2) 


(2.6.3) 


(2.6.4) 


(2.6.5) 
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(Depletion region width): 


We =Xp (2p¢ +Vop 172 (2.6.6) 

Wp =Xp (206 +Vgp +Vpg )'7? (2.6.7) 
2€ 

Xx 2 y172 (2.6.8) 


o =" SNSUB NEFF 


= an s 2.6. 
qeilt+ (2.6.9) 


Narrow channel effects are accounted for by equation (2.6.9), in which » increases 
as channel widths decrease. This serves to raise the built-in voltage (equation 2.6.2), 
increasing the threshold. 

The input parameter XJ controls the modeling of short channel effects. Depletion 
regions around the source and drain serve to ionize carriers in the bulk and lower the 
threshold of short channel transistors. This is modeled using Dang's trapezoidal 
approach [10], where the width of the depletion regions at the source and drain are 


given by equations (2.6.6) and (2.6.7) respectively. 


Surface Mobility: 


E = Cox (Vac - Vr) (2.6.10) 
x nee GS "T = 
( UORIT. \UEXP 
POSE, (a) if E, > UCRIT 


Bett = (2.6.11) 
Ho (o) if E, <UCRIT 
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Mobility remains constant, at its low-field value of Bo: until the vertical electric field 
E. exceeds UCRIT. Above UCRIT, mobility decreases exponentially according to 


equation (2.6.11a). 


Drain Current: 


V_ = Min (VooVpsae) (2.6.12) 


Ww 
ae {Coy (Veg - Vain — 1,72) Vy - 


DS ~ (2.6.13) 
2 
3 Ys [206 Vy +V gp YY? - (295 +Vgq FD 


This formula is essentially identical to the theoretical prediction given in equation 


(2.4.3), except for the inclusion of short channel effects (via Ys) and narrow channel 


effects (via 7). When Vos exceeds Vosat? Ve becomes constant and the drain current 


saturates. 


Saturation Voltage: 


! 
VMAX = aa eta) 


W Coy Hert (Vag ~ Yan ~ ™sat~Ys | Ypsat +2%F *¥sa] 


2 
X, “VMAX X >, VMAX 
D D 1/2 
= +——— oo X [ ef (V = )] (2.6.15) 
eff P Hert D 2He ff DS Dsat 
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Lh =L- 2b 


; ; (2.6.16) 


Expressions (2.6.14) and (2.6.15) are nonlinear equations in the two unknowns Lott 
and Vn sat [44]. Velocity saturation (Figure 2.9) is accounted for by the parameter 


VMAX. Finite output conductance in the saturation region is introduced through the 


channel length modulation term Lage: 


The input parameters, most of which are physical constants of the fabrication 
process, are summarized in Table 2.10. (Only "NEFF" is an empirical adjustment). The 
actual Berkeley model implemented in SPICE is slightly more complicated, since it 
includes prethreshold conduction. However, this feature was not used in the 
compilation experiments, so the modeling equations are omitted here. 

The model program is relatively expensive to evaluate; including the calculations of 
the small-signal conductances, it requires 257 double-precision floating multiply 
operations, 125 divisions, 20 square roots, 6 exponentiation operations, and 4 logarithm 


evaluations. 
2.7: Mosaid Model 


The Mosaid mode! was developed to provide reasonable modeling accuracy while 
permitting high speed simulation. It is based primarily on the Shichman-Hodges 
equations (2.3.9), which are very fast to evaluate. Empirical parameters are introduced 
to account for short channel effects, while avoiding nested iterations (such as 


equations 2.6.14 and 2.6.15), 


Effective Gate Length: L eff = L- aly (2.7.1) 
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Description 


— 


i 


NSUB Substrate doping density 


Sty | Meters Lateral diffusion & poly over-etch 
NEFF Dooce ate al Total channel charge coefficient 


za Meters2/Volt-Second | Low-field surface mobility 
UCRIT Volts/Meter 
Meters/Second 


Table 2.10 Berkeley model parameters 


N Meters~2 Density of surface states 
2 
T 
»¢ 
L 


ss 
Pr 
Ox 
J 
D 
a) 


Critical field for mobility degradation 


Exponent of mobility reduction 


Scattering-limited carrier velocity 


Threshold Voltage: 


Vos 


Ve =Vrq tL (p +Vgp)'/? - (g)'”7] - Kp Te (2.7.2) 


The Shichman-Hodges mode! (Equation 2.4.1) is used to account for the body 


effect, with the addition of short channel threshold reduction through the parameter 
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Ky. Narrow channel threshold effects (Figure 2.8) are not modeled. 


Effective Gate Drive: Ve = Ves -Vr (2.7.3) 
Effective Gain Factor: K eee (2.7.4) 
eff “14 OV, 


In this formulation, effective mobility is decreased even at low vertical electric 
fields, unlike the two-region model implemented in the Berkeley model (equation 
2.6.11). This equation has been found to yield acceptable accuracy while permitting 


fast evaluation [9, 10, 47]. 


Drain Current: 


(a) itv, <0 


Kort 


I (2V.-Vpg) Mp5 (8) Mpg § Vagar (2-7-5) 


OS Lett t2Ker¢RgVpg 
K_,,W (c) IVs 2 Vpsat 
eff Vp 2 
tere Cups -Ypsat) 754! 
2V, 
Saturation Voltage: V ed (2.7.6) 


Dsat 
1 4[1 +(4Ko4pRe YI le 


Velocity saturation effects (Figure 2.9) are modeled by postulating a feedback term 


R with the dimensions of resistance, such that the gate drive Ve é is reduced as drain 


current increases: 


Vee = “as ~¥7 -'psh 
To make the R term geometry-independent, it is replaced by Ro/W in the Mosaid 


model. The reduced gate drive is substituted into the Shichman-Hodges model 
(equation 2.3.9b), giving the drain current (equation 2.7.5b). [KP in the Mosaid model 
is equal to K/2 in the Shichman-Hodges model]. 


The saturation voltage Vosat is found by setting Alns/8Vng equal to zero, which 


gives equation (2.7.6) 


Vv 
Channel Length Modulation: ae ae (2.7.8) 
ae (Vos my “psat) ) (2.7.9) 


The original Mosaid model used a constant value of Cuo in 2.7.5(c); this gives rise to 


a slope-discontinuity at V 


Dsat ‘ in the linear region the slope (8lng/8Vp5) tends 


toward zero at Vpsat’ while in the saturation region the slope remains finite at Vo sat’ 


Since the SPICE circuit simulator uses first derivatives in its Newton-Rhapson 
computations, a discontinuous slope can cause simulation nonconvergence, so the fix 
described by equations (2.7.8) and (2.7.9) was inserted. Parameters for the Mosaid 
model are summarized in Table 2.11. 

The Mosaid mode! is comparatively inexpensive to evaluate; it requires only 36 
double-precision floating ‘multiply operations, 12 divisions, 6 square roots, and 2 


exponentiation calculations. 


fs [ati wae Lateral diffusion & poly over-etch 
Zero-bias threshold 


— Volts!72 Body effect coefficient 


Short channel threshold coefficient 


1/Volts Mobility degradation coefficient 


ara Ohm - Meters | Velocity saturation feedback constant 
hee Meters / Volt | Channel length modulation constant 


Table 2.11 Mosaid model parameters 
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Chapter 3: Minimization Algorithms 


After a mathematical model of transistor behavior is selected, parameter values 
must be determined which give the best fit to a particular set of measurements. 
Serious modeling errors can -occur if the parameter values are poorly chosen. To 
insure that the best possible modeling accuracy is maintained across the entire range 
of transistor operation, it is desirable to create a set of "optimally good" parameter 
values. 


Let P be an n-vector such that Py is the value of the k-th model parameter. 


Suppose there exists a function f: R” +R! such that f(P) is a measure of the 
modeling error incurred when the parameters P are used. Then the optimum parameter 
values exist at the point P where f(P) is smallest. 

The problem of finding the minimum value of a function has been extensively 
studied [12, 16, 18, 30]. This chapter presents five of the better-known minimization 
algorithms. Together with a suitable function f(P), they can be used to construct 
programs for compiling optimal sets of model parameters. Detailed derivations of the 
individual algorithms are not given, nor are convergence properties proven, since the 
emphasis is on their application and not their design. In each case, the original 
papers are referenced for the reader who wishes to study the algorithms in greater 
detail. 

It should be noted that these algorithms are designed to find /oca! minima; no 
computationally tractable algorithm is known for finding the global minimum of an 


arbitrary function. 
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3.1: Notation 


To simplify the discussion of the algorithms, a common notation has been used 


throughout the chapter. The function f(x): R” +R is to be minimized, and the 


n-vector xj is the starting point for the jth iteration. A displacement Ax; is made from 


the present point xj to the next point Xi rT 


x = X, + Ax, (3.1.1) 


i+ j 
The displacement can be written in the form 
Ax; = -@.p, (3.1.2) 
where p p an n-vecior, denotes the search direction and a; a scalar, is the stepsize. 
In general, the symbol 


Vit (x) (3.1.3) 


ie] 


will be used for the gradient of f, where g is an n-vector. Also, 


G 


V2 F(x) (3.1.4) 
is the nXn matrix of second partial derivatives. For scalar functions, the symbol 
t(k)(x) will be used to indicate the k-th derivative of f, evaluated at x. 


3.2: Hooke and Jeeves Algorithm 


This algorithm is based on the extremely simple idea of varying one coordinate at 


a time. From an initial point Xo: a step of length 6 is taken along coordinate 1. f is 


evaluated at (XO): (Xa +6 ey); and (X gg - 6 e,), where e, is a unit vector along 


coordinate 1. Whichever of these three points gives the lowest value of f is then 


named Xo and the process is repeated for all n unit vectors, finally producing Xon 


which completes iteration 0. 

When a displacement is made that reduces the function value, further steps along 
the same direction will probably continue to decrease f. The Hooke and Jeeves 
algorithm [26] attempts to exploit this property, using a successful direction of descent 
as a first i for further exploration. For this reason, the algorithm is also called 
"pattern search" in the literature. Instead of starting iteration 1 at the endpoint of 


iteration 0, the displacement vector is doubled in length: 


+ 2x (3.2.1) 


X19 = XQ9 on ~ Xoo) = 2%on - Xoo 


The algorithm’s progress on minimizing a function of two variables is shown in Figure 
3.1, 

As the algorithm proceeds toward a minimum, the stepsize 6 eventually becomes 
too large and must be reduced. It is replaced when an iteration fails to reduce the 


function value: f (x > f (x;q)- A simple geometric formula is used: 


ind 2 


58° = pd (p < 1) (3.2.2) 


The algorithm terminates when the value of 6 is reduced below some minimum stepsize 


Eup The implementation of this algorithm described in chapter 4 uses 


a, —4 
Ey) = 10 
rt) 0 = 0.1 (3.2.3) 
p = 04 


Figure 3.1 Hooke and Jeeves algorithm 


3.3: Simplex Algorithm 


This algorithm is based on topological generalization of a triangle’. Just as a 
triangle is a two-dimensional object having three vertices, a simplex is an 
n-dimensional object having n+ 1 vertices. A tetrahedron, for example, is a simplex in 
3-dimensional space. 

From a starting simplex of n+ 1 points, the point at which f is highest is reflected 
through the hyperplane formed by the other rn points. If the function value at the new 
point is lower, the simplex has been improved and ‘the procedure can be repeated. 


During the minimization procedure the simplex changes in size and shape, adapting to 


Tepes aes chee ek ee ee . se . . - im Wea 
This algorithm for function minimization should not be confused with Dantzig’s "sim- 
plex algorithm" for linear programming. 


the local contour of f. The algorithm is given below using the notation of Fidler and 


Nightingale [14]: 


Xr is the vertex which gives the highest value of f(x) 

Xg is the vertex which gives the second highest value of f (x) 
x) is the vertex which gives the lowest value of f (x) 

Xo is the centroid of all vertices xy other than Xy: 


: 1 
Xo = rs =X, (k #H) (3.3.1) 
Three different operations may be performed on the simplex. 
(1) The vertex xX, May be reflected through the centroid to give Xp! 
Xp = (1 + a)Xo - ax, (3.3.2) 


(2) The simplex may be expanded along a favorable direction: 
Xp = Xp + (1- Y) Xo (3.3.3) 


(3) The simplex may be contracted: 
Xo = Bx,n + (1- B)xo (3.3.4) 
These operations are used to find the minimum value of f in Algorithm 3.2 [14, 
35]. 


The algorithm is terminated when the standard error falls below a given threshold Eo! 


=12 , 


- Eo (3.3.5) 


Since the vertices of the initial simplex may be widely separated, the algorithm 
essentially has n+1 independent starting points. The impact of an especially poor 


starting point is therefore reduced. 


[1] The distinguished vertices Xp Xoa Xp and x are computed. 
[2] Xp is computed according to (3.3.2). 


[3] If F(xp) SF), but F(x) > F(x, )s an improvement has 
been made, so x,, is replaced by Xp and another iteration can 


begin at step [1]. 


[4] lf f (Xp) ¢ f (x, ); the reflected point has a lower function 


value than all other vertices. This direction appears to be favor- 
able, so the expansion (3.3.3) is attempted. xy is replaced by 


whichever of Xp OF Xp gives the lower function value, followed 
by a return to step [1]. 


[5] If F(xp)< #(x,,), but f(x_)> F(xo), only a minor improvement 


has been made, and a contraction is performed using equation 
(3.3.4) in case the reflection has overshot a better point. Xe is 


replaced by whichever of Xp OF Xo gives the lower function 
value, and the algorithm returns to step [1]. 


[6] If f (xp) > f (x1,); no improvement has been made by reflec- 


tion. The implication is that the minimum probably lies within the 
simplex, so the simplex is shrunk about the lowest point x): 


x, = (x, +x, ) 72 
Algorithm 3.2 The simplex method 


Chapter 4 describes an implementation of the simplex algorithm which uses 


a -11 
Es = 10 
a = 10 
B = 0.5 (3.3.6) 
y = 20 
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3.4: Newton's Iteration 


The Hooke and Jeeves and simplex algorithms operate on function values only; no 
derivative information is required. If the first and second derivatives of f are available, 
more powerful algorithms may be applied, giving faster and more reliable convergence. 


These algorithms use a first-order Taylor series expansion of f about x4 In the one- 


variable case, 


F(x) Bf (x4) +o- xy) f Me) (3.4.1) 


9) 
A point Xo is not a minimum unless 


df 


2) = 0 (3.4.2) 


Maximum points and inflection points also satisfy (3.4.2); local explorations about Xo 


are usually performed to verify that it is indeed a minimum. 
An approximate solution to (3.4.2) can be found by differentiating (3.4.1) and 


setting the result equal to zero. 
0 = aly ) [ - ot ) . (3.4.3) | 
ae x4) + Ko- x,y 3x2 x4 A, 


Equation (3.4.3) has the solution 


#1) (x4) 
ona Pees ena (3.4.4 
2 1 (2) (« 4) 


which is known as Newton’s iteration. 


In the multivariable case f:R” rR), the first-order Taylor approximation is 
f(x 5) ~~) f(x 4) +9! [x5 - x4] (3.4.5) 


where g is the gradient of f at X4. The derivative of equation (3.4.5) is set equal to 


zero, giving a minimum at 
Xp 2x, - Guly (3.4.6) 


where G is the matrix of second partial derivatives at x4: In the terminology of (3.1.2), 
the search direction P, is given by (e71 g), and the stepsize a, is equal to one. 


Equation (3.4.6) is used to calculate the displacement Ax; = -6,"" Qj. which 


provides the starting point for iteration (j + 1). Iterations are performed in this fashion 
until convergence is achieved. The algorithm is considered to have converged when 


the gradient is approximately zero (see equation 3.4.2): 
< 3 
99S Epy (3.4.7) 


Newton's iteration has the desirable property of quadratic convergence: if f is a 
quadratic function, the minimum point is achieved in a number of iterations at most 
equal to the number of variables [27]. Since an arbitrary function can be closely 
approximated by a quadratic over a small interval, the algorithm will converge equally 
rapidly on arbitrary functions, providing the starting point is close to the minimum. This 
property assures rapid convergence in the final stage of computation. 

Newton’s iteration (3.4.6) has limited usefulness as a practical minimization 


algorithm, because it requires the user to supply formulae for calculating the n first 


derivatives in the vector g, and the (n 2) second derivatives in the matrix G. Frequently 


this is inconvenient (or impossible), as for example when the number of coordinates n 
is large. Variations of the basic Newton iteration have been developed which do not 


require second derivatives; they are much more useful than the original formula (3.4.6). 


These quasi-Newton algorithms use approximations of G or G~1 to eliminate the need 


for second derivatives. 
3.5: Quasi-Newton Algorithms 


Huang [27] has presented a class of quadratically convergent algorithms which 


replace Newton’s iteration (3.4.6) by the approximation 


where the nXn matrix H characterizes a particular algorithm. (In all cases, H is 


computed without using second derivatives). The initial matrix Ho is taken to be the 


identity matrix. Several well-known and highly successful minimization algorithms are 
members of Huang’s class, including the conjugate-gradient algorithms and the variable 
metric algorithms. 


Once the search direction P; is known, the stepsize a; must be chosen. For good 
progress toward the minimum of f, the value of a, should be selected which minimizes 


the scalar function u(a): 


u(a) = f(x;- ap,) . (3.5.2) 


Very efficient algorithms are known for univariate minimization, so the optimum a can 


quickly be found. This determines the displacement Ax; and hence the new point 
x; +1 and another iteration can begin. The algorithm terminates when the gradient is 


sufficiently small, similar to the Newton termination criterion (3.4.7). 


3.6: Conjugate-Gradient Algorithm 


A particularly appealing member of Huang’s class is the conjugate-gradient 


algorithm, studied by Polak and Ribiere [37]. It uses the approximation 


oe | 
P;_ 4(9;_4- 9;) 
ie OP ee (3.6.1) 
T 
9j-1..9j-1 


where /.is the identity matrix. Substitution into (3.5.1) yields 
PD; = 9; + YPi_4 (3.6.2) 


where the scalar y is given by 


(9,47 9,)'9, 
Wie: eee (3.6.3) 


T 
9}. 19 j-1 

This substitution shows that the matrix Hj need not be stored directly; only the vectors 

Pi  P i’ g j-v and g j are required. The Polak-Ribiere approach is therefore 


attractive for very large problems, where the storage cost of an nXn H-matrix would 
be prohibitive. 


Equation (3.6.2) gives the search direction p p and the steplength a; is computed 


by univariate minimization. Iterations proceed in this fashion until the gradient vector 


approaches zero: 


gg St, (= 10719) (3.6.4) 


< AL 


3.7: Fletcher's Switching-Policy Algorithm 


This algorithm combines the well known Davidon-Fletcher-Powell [17] and 
Broyden-Fletcher-Goldfarb-Shanno [23] variable metric procedures, giving a hybrid 
algorithm which is superior to each [15]. 


The DFP algorithm is a member of Huang’s class, characterized by 


a ee ok Le ok 


H, = H, (3.7.1) 
-1 
j j ay YH ay 
i] 
where 
6 = x,- x, 
-1 
2 a : (3.7.2) 
a io 
The BFGS algorithm is also a member of Huang’s class: 
Sy!H. H, yo! YH Y ig 
Hi, = Hoy> SO - Se bs (3.7.3) 
by aly bly Bly 


Fletcher's switching-policy algorithm uses either (3.7.1) or (3.7.3), depending on the 


outcome of a test: if 
sy > WH yy (3.7.4) 


then the BFGS update formula (3.7.3) is used for iteration /. Otherwise, the DFP 
update (3.7.1) is used. Fletcher has presented performance data indicating that this 


hybrid algorithm is faster than either of its component parts. The hybrid approach, he 
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claims, avoids near-singularity in the H matrix, making the algorithm more stable [15]. 
lf the H matrix is not positive definite, then stability and convergence to a minimum 
are much less likely. Fletcher’s algorithm contains a safety test which prevents H from 


becoming nonpositive definite. Whenever 
Bly < ey (= 10°") (3.7.5) 


the algorithm is "restarted" by setting H j to the identity matrix. The algorithm is 


terminated when the gradient becomes sufficiently small: 


g'9 < ee (= 107"9) (3.6.4) 


3.8: Line Searches 


After the search direction p is determined, the line (x - ap) is searched to find 
the optimum value of a. In this section, two methods are discussed for performing a 
univariate search. Both algorithms proceed by finding an interval (a, b) which is 
known to contain the minimum point x. The size |b - a| of this "bracket" is then 
decreased repeatedly until a minimum is found. 

A simple criterion may be applied to find an initial bracket: at least one local 


minimum exists in the interval (a, b) if there is a point c within the interval, such that 
(agece<b) and [f(c)< f(a)] and [f(c)< f(b)] (3.8.1) 


A bracket satisfying (3.8.1) may be easily found using the algorithm of Davies, Swann, 
and Campey [6]; the procedure is illustrated in Figure 3.3. From the initial point a = 0, 


a step of size 6 is taken in the +a direction. Stepsizes are repeatedly doubled until 


f(alpha) 


alpha 


1 2 4 - (8 * delta) 


Figure 3.3 Establishing an initial bracket 


the minimum is overshot and condition (3.8.1) is met, establishing the first bracket. 
Once the initial bracket is found, a simple algorithm for converging to a minimum 

is the Golden Section search, illustrated in Figure 3.4. New points c and d are 

selected, which divide the original interval into three pieces: (a, c), (c, d), and (d, b). 


The function f is evaluated at c and d, and the reduced interval is given by 


(a",b°) 
(a”, b’) 


(a, d) if f(c) <t(d) 
(c, b) if t(c)> f(d) 


(3.8.2) 


The size of the bracket has been reduced by the fraction ad/ab (or cb/ab). To keep 

the algorithm unbiased, these fractions should be identical. Choosing c and d such 

that cb = ad insures that the bracket will shrink by the same amount on either side. 
Two function evaluations were used in the simplified iteration described above (at 


c and da). The Golden Section algorithm actually uses only one evaluation per 


f(alpha) 


fo old bracket 
k— * new bracket —y 


alpha 


a* c* 8 d* b* 


Figure 3.4 Golden Section search 


iteration, through careful selection of the points c and d. If the new bracket is, for 
example, (a, d), then the old point c lies within the bracket, and f(c) has already been 
evaluated. The point c should therefore be reused, so that only one new point needs 
to be evaluated in the next iteration. 
If the ratio ad/ab is equal to the ratio ac /ad, then the point c can be reused and 
the resulting partition of the bracket will be unbiased, as shown in Figure 3.4. 
Normalizing to ab = 1, let y = ad so that (1- y) =ac. Then 
Ys — (3.8.3) 
This equation produces the Golden Section ratio used in Greek architecture (hence the 


algorithm’s name). The solution is 


1/2 
y= Q + = 0.61803 (3.8.4) 


Two function evaluations are performed for the first bracket. Each subsequent bracket 


uses equation (3.8.4) to compute one new point, so only one function evaluation is 


required per iteration. After n iterations, the bracket has shrunk to (0.61803)” of its 
original size. The search is completed when the bracket has been reduced to a 


sufficiently small size 


Jb - al S & (= 10715) ; or 


(3.8.5) 
|b- al 


Table 3.5 compares the Golden Section algorithm to a simple binary search. On 
each iteration, binary search halves the bracket, but two function evaluations are used. 
The table shows the number of function evaluations required to shrink an initial bracket 
of size |b - a| = 1 down to various final sizes; binary search takes approximately 1.4 


times as many iterations as the Golden Section search. 


Line Search Final bracket size 


[Binary searcn | eo | 14 | 20 | 20 | o4 | a0 | 
Golden section | 5 | 10 | 15 | ao | 2 | 20 | 


Table 3.5 Function evaluations required for bracket reduction 


Another popular line search algorithm uses a quadratic interpolation scheme for 
finding the minimum [22]. Given an initial bracket (a, b) and a point c within the 
bracket, a quadratic is fitted to these three points. The minimum of the quadratic is 


given by [38]: 


[b?- c2jt(a) + [c2- aIt(b) + [a2- b@IF(c) (3.8.6) 
[b - c]f(a) + [c - a]f(b) + [a —- b]f(c) 


hol 


A new bracket is then formed, taking the three lowest points found so far (which still 
maintain a bracket). Another interpolation is computed, and the process is repeated. 
Near the minimum, a quadratic interpolation will be very close to any arbitrary function, 
and the final stages of convergence will be very fast. 

After evaluating both line search algorithms, the Golden Section method was 
selected for use in the minimization implementations described in chapter 4.. When an 
initial bracket contained several minima (a common occurrence), Golden Section 
appeared to converge more rapidly than quadratic fitting; presumably this is because a 


quadratic is a poor approximation to a function with more than one minimum. 
3.9: Numerical Differentiation 


In many cases, analytic derivatives of f(x) are not available, so they must be 


computed numerically. For example, the derivative OI no/ @VMAX for the Berkeley 


model does not exist in closed form (see equation 2.6.14). A finite difference formula 
can be used to estimate the derivatives: 
0 od eee 3.9.1) 
Ox, ) 
/ 
lf f(x) is known, this forward-difference formula requires only one additional function 


evaluation to compute the derivative. A central-difference approach would need to 
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evaluate f at both (x + 8e)) and (x - de); this is too expensive for use in a 


minimization algorithm. The size 6 of the finite interval is chosen to reflect the scale 


of the component x; 


5 = max (hx, 8°) (3.9.2) 
where 
h erie 
; (3.9.3) 
8° = 10718 


3.10: Marquardt’s Algorithm 


Suppose a physical process is modeled by the equation 
y = M(V,x) (3.10.1) 


where y is the dependent variable, V = (V,, Ve 55 Vi) are the independent variables, 
and x = (x4, Xo oon xy) are adjustable parameters of the mathematical model M. A set 


of m experiments is made, obtaining values of y for different values of V. The problem 
is to find those values of x which give the experimental data the best fit to (3.10.1). 


The best fit in the least-squares sense is obtained by defining the m residuals: 


Rix) = MV, x) - yj (3.10.2) 
and minimizing f, the sum of squares of these residuals [28]: 


f(x) = SR(x)? = RO) R(x) (3.10.3) 


Special purpose algorithms for minimizing sum-of-squares functions such as (3.10.3) 
have been developed; the most popular of these methods is due to Marquardt [31]. 


Newton's iteration can be applied directly to (3.10.3), giving 
G(x) Ax = - g(x) (3.10.4) 
The gradient g(x) can be expressed in terms of the vector A of residuals: 
g(x) = 2U(x)/R(x) (3.10.5) 


where J(x) is the m Xn Jacobian matrix with 


Ji(x) = : (3.10.6) 


The matrix G of second derivatives is given by 
G(x) = 2K)TU) + 2 V2R(x) R(x) (3.10.7) 


where VR Ix) is the second derivative matrix of R(x) [28]. 


Equation (3.10.7) shows that a substantial part of the matrix G is obtained by 
using only first derivatives. Near the solution, if the residuals are small or almost linear, 


then the second term of (3.10.7) is negligible. This leads to the approximation 
G(x) & 2(x)Tu(r) (3.10.8) 
which can be substituted into Newton’s iteration (3.10.4), giving 


JOx)Tu (x) Ax = = J(x)TR(x) (3.10.9) 


Equation (3.10.9) is known as the Gauss-Newton algorithm. 
In practice it has been found that (3.10.8) is sometimes a poor approximation, 
leading to divergence of the iterates. Marquardt suggests that a "damping term" (A/) 


should be added to the Gauss-Newton algorithm [31]: 


(vty +Al)Ax = - a/R (3.10.10) 


where / is the identity matrix and A> 0 is a variable parameter of the algorithm. 
On any particular iteration, J and R are fixed, so that Ax can be considered a 


function of A. When A = 0, Marquardt’s algorithm reverts to the Gauss-Newton formula 


(3.10.9). As AO, then Ax > -JTRA, which is an incremental step along the 
direction of steepest descent of f. Note that divergence is impossible with suitably 
large values of A. 

At iteration j, Marquardt’s algorithm solves (3.10.10) for Ax, using increasingly 


larger values of A until a displacement Ax is obtained for which F(x; + Ax)< f (x;). 


Typically X is increased in powers of ten until a function decrease is obtained: 


* 


A = 10A (3.10.11) 


The algorithm is halted when the displacement Ax is negligibly small 


Ax 
—_—_— 10.12 
T+ Ix; | S &y S 


or when the value of A exceeds a preset threshold 


A > A (3.10.13) 


- 50 - 


The implementation of this algorithm described in chapter 4 uses 


Tu 1073 
= 4103 (3.10.14) 


A = 1011 
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Chapter 4: Automatic Parameter Extraction 


Parameters for empirical MOS models are usually generated by a tedious set of 
hand calculations, followed by an iterative procedure of "tweaking" the numbers until 
an acceptable curve fit is obtained. Although it has been used successfully for many 
years, this method suffers from at least three major drawbacks: (1) it is time consuming, 
(2) it is prone to errors of omission or oversight, and (3) its termination criterion is 
qualitative rather than quantitative. This chapter discusses the computer 
implementation of numerical algorithms. for parameter extraction, designed to eliminate 
these drawbacks. Prototype versions of parameter compilers (using these numerical 
algorithms) are presented. The performance of these compilers, measured in terms of 
modeling accuracy and compilation speed, is detailed. 

Since transistors have several different regions of operation (Cutoff, Triode, 
Saturation) and geometry-dependent behavior (short and narrow channel effects), many 
measurements are needed to adequately characterize device operation in all regimes. 
These regions are arbitrarily constructed for ease of explanation and understanding; 
transistor behavior is actually a continuum, with certain effects more pronounced in 
some areas than in others. This leads to difficulties in parameter extraction, because 
(erroneous) assumptions are often made that device behavior is localized within these 
regions. For maximum accuracy, the entire operating range must be considered when 
extracting parameters, because they interact in the final model. Unfortunately, manual 
parameter extraction methods assume that (some) parameters are independent, and 
that their effects are localized; this necessitates "tweaking" the final numbers to 


obtain acceptable accuracy. 
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4.1: Manual Methods 


To illustrate the methods used in manual parameter extraction, a manual extraction 
will be performed for the Mosaid model of section 2.7. The parameters of long, wide 
channel transistors will be extracted first [44]; the model fit for these large devices 
should be very good, because they will be closely approximated by the first-order 
theory. Then small-geometry effects will be accounted for, taking care to isolate the 
effect of each parameter, so its value can be calculated independently from the others. 


The first parameter to be measured will be the threshold voltage Vy of a long, 
wide channel device. Equation (2.7.56) shows that if Vos is very small, drain current 
is proportional to Ves - V7 ~Vp9/2. Measurements of drain current are taken at 
various values of gate voltage Ves) current begins to flow at Ves 7 Vy + Vos9/2: 


This is shown in Figure 4.1. Drain current is not precisely linear with gate voltage, 


because the surface mobility (Hors) begins to fall as Ves rises. V; is found by 
extrapolating to Ing = 0 from the maximum-slope portion of the curve, to insure that 


degraded mobility is not inadvertently included in the threshold voltage. 
Figure 2.6 shows that threshold voltage is a function of source-to-bulk voltage 


Vogi by repeating the measurement of Figure 4.1 at several values of Vog> the body- 
effect parameters g and y can be computed. The parameter Vro is simply the 


measured value of Vy taken at Vog = 0. Rearranging equation 2.4.1, 


Vp-Vrq = YP +¥gg)'7? - y(p)'/? (4.1.1) 
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Figure 4.1. Measurement of Vr 


Initially taking m=0.6 (a good approximation for silicon devices)', a change of 


variables can be made: 


yoo Ve =Vro 
nay (4.1.2) 
b = -y(9)'7 
Equation (4.1.1) is transformed into the familiar linear equation 
y = mx + b (4.1.3) 


gy represents the Fermi potential of the silicon surface, 29¢- It is proportional to the 


logarithm of the bulk doping density, and its value is usually close to 0.6 in typical 
silicon MOS transistors. 


Using the trial value of 9, the measured data can be plugged into equations 
(4.1.2) and (4.1.3), extracting values for m and b with a simple linear regression. This 
computation is easy to perform, and is implemented on several hand-held scientific 
calculators [24]. 

The new value of g is inserted into equation (4.1.2) in place of the initial guess, 
and another linear regression is performed. Regressions are iterated in this fashion 
until the values of gm and y stabilize (say, to within 0.1%). 

After extracting the model parameters for threshold, the gain parameters KP and @ 
are computed. Drain current measurements are taken on long, wide devices in the 


saturation region (where equation (2.7.5c) applies). Solving for KP, 


KP =[1 + 0(Vgg - Vz) a DS (4.1.4) 
Ves 7 V7? 
which suggests the substitution 
£ Ips 
y = WwW 
Was") 
yay - Wes - 7) 'ps 
2 
(Ves - ¥7) 
(4.1.5) 
ms @ 
b = KP 


The measurements are plugged into (4.1.5), and a linear regression is performed, 
giving @ and KP. 


The  channel-shortening parameter Lp can be found by measuring 
transconductance (8lp5/8Veg) as a function of mask channel length L. Two 


transistors with identical channel widths and different channel lengths are operated at 


low drain voltage, and measurement of lng are taken at several values of Ves: 
Transconductance is the maximum value of the slope of this curve; see Figure 4.1. 


Under these conditions, 


Const 


08 ET TAL. 


(4.1.6) 


where AL = (-2Lp ). 
Measured values Slope, and Slope, are taken on devices with mask channel 


lengths Ly and Lo. The two resulting equations (4.1.6) are divided, giving the ratio 


Slope, L o+ AL 


se ee Pe 4.1.7 
Slopes Ly + AL ( ) 


Solving for AL, 


L 5 Slopeg - Ly Slope, 
AL = ———————______—_ (4.1.8) 
Slope, - Slope 


At this point, model parameters Veo y, yp, KP, 8, and Ly are known for long, wide 
transistors. The smail-size parameters are extracted next, beginning with Kp. 


Rewriting equation (2.7.2), 


kK. = weft 
im 
Yos 


{Vrg-Vz typ +Vgg)'7? - (@)'74] 3 (4.1.9) 


At a_ constant Vos: threshold measurements are made for several values of 


channel length Loge: Each data point is plugged into equation (4.1.9), giving a value 


of K D° These are averaged to produce the final K D: 
To extract a value of Ro, drain current measurements are taken in the triode 
region. The current equation (2.7.5c) is solved for Roy giving 
ee Rett Y Vos ("9 Ys). Lor] (4.1.10) 
2 Kort ’ps lps 
Several Ips measurements are taken on short channel transistors in the triode 
region, and values of Ro are then computed from (4.1.10). The final value of Ro is 
taken as the average of the results from (4.1.10). L eff? Katte and Ve values used in 


these calculations should be computed from equations (2.7.1), (2.7.4), and (2.7.2) 
respectively. 


Cryo can similarly be extracted from /,~. measurements made on short-channel 


devices in the saturation region. Equation (2.7.5c) is manipulated to yield 


2 
Bone 1 I Kort “ Vosat 
MO ~ & 
VDsat Vos ‘ 


-L (4.1.11) 


] 
DS eff 


Vosat is computed from equation 2.7.6 and known values of the other parameters. 
The final value of Cuo is taken to be the average of the individual values computed by 


equation (4.1.11). 
Several simplifying assumptions were made in the parameter extraction recipe 
given above, resulting in increased modeling errors. For example, equation (4.1.6) 


ignores the Ro term in the denominator of equation (2.7.5b) by assuming that drain 
current is inversely proportional to (L + AL). More significantly, the values of Veo y 


gy, and Kp are derived from indirect measurements: first a quantity called "Vy " is 


ss dee 


computed, and then parameter values are fitted from threshold data. 


Actual MOS transistors do not abruptly turn off at Vas = Vr as shown in Figure 


4.2. Below threshold, drain current is an exponential function of gate voltage [42, 48]. 
The Berkeley model includes an optional term to account for this "prethreshold 
conduction"; it was omitted from the discussion in chapter 2 because the option was 
turned off in the compilation experiments. (The Mosaid model does not account for 
this current). The gradual transition from "off" to "on" makes the task of selecting a 


unique Vr difficult, so parameter values extracted from such measurements will have 


large uncertainties. 


log(Ids) 


Vt 


Figure 4.2 Prethreshold conduction 


4.2: Numerical Techniques 


The manual parameter extraction technique presented in the last section can be 
thought of as a sequence of minimizations: in each case, values for parameters are 


chosen so as to minimize modeling error. This is accomplished by using linear 
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regressions or by simple averaging. Unfortunately, the errors introduced in each of the 
individual extractions can accumulate, leading to a higher total modeling error (over the 
entire range of device operation). 

The single-pass nature of the manual extraction technique does not permit residual 
model errors to be spread among several parameters, which could lead to an overall 
error reduction. Since each parameter is extracted in a small region of the total device 
operating range, with this scheme it is impossible to distribute errors over many such 
regions. In this section, numerical minimization algorithms (presented in chapter 3) will 
be used to extract all parameters simultaneously. 

The objective function to be minimized will be a measure of the modeling error, for 


example the normalized error at each data point 


— Ipgmli) = pgp PM) 


= 2. 
ips) il 


At the /th data point, the modeling error e ; is the normalized difference between the 
measured current losm (/) and the model’s predicted current lpSp (P)(/). (The notation 
Ins 0 (P)(/) stands for the predicted value of Ing at the jth data point, using the vector 
P of model parameters). Measurement j is taken on a transistor of size (W,ZL jp at 
Veg li Vpg li. Vogl) )- 


If the model current is a continuous, differentiable function of the parameters, then 


- at each point. Quasi-Newton minimization algorithms, 


so is the modeling error ej 


which require require the objective function to be differentiable, may therefore be used 
with equation (4.2.1). 
A simple measure of the total modeling error with parameter-vector P is the sum of 


the squares of the n individual errors 
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f(P) =e,7 +e, + rae +e,2 (4.2.2) 


which is a differentiable function of P. 

The distribution of measurements about the total operating range of the transistor 
will affect the accuracy of the final model. For example, if the vast majority of 
measurements are made in the linear region, model accuracy in the saturation region 
will be impaired. Regions of ‘operation are, therefore, implicitly weighted in the error 
formulation, according to their frequency of appearance in the measurement data. if 
the measurements are clustered in a small region, the error will be lowest in that 
vicinity. Large errors at a few outlying points will be tolerated, because their 
contribution to the final total is small. 


Explicit weighting is also possible [46], giving a total error function 
f(P) =w4e,? tween? + 1) twee 2 (4.2.3) 


Weights might be explicitly applied if it is desired to emphasize modeling accuracy 
in certain crucial regions of operation. Although implicit weighting could be used 
(simply by including many data points in those regions), explicit weights are more 
suitable, because they require fewer total data points (and fewer evaluations of f). 

lf the measurements are made on physical instruments, some accuracy will be lost 
at very low currents. It is desirable to prevent the extraction algorithm from generating 
faulty parameter values due to noise in these measurements. A modified error function 


can be used to set a "current floor"; below this floor value Leni measurements are 


given less weight in the overall error function [46]. 


! jy-/ P)U 
oe fosm) ~ ‘psp? M) (4.2.4) 
Fo max Pg) ' min! 


A floor value is also useful if the model to be extracted does not account for 
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prethreshold conduction. 

Several of the algorithms presented in chapter 3 are sensitive to the scale of the 
variables; if the elements of the parameter-vector differ by orders of magnitude, severe 
loss of accuracy will occur [31]. For this reason the parameter vector is approximately 
normalized; its elements are divided by typical values, insuring that all parameters are 
near unity. The parameters are of course de-normalized for the evaluation of the 
actual modeling routine. 

The organization of a model parameter extraction program is diagrammed in Figure 
43. First, n transistor measurements are input; they include the measurement 


conditions (W, L, Vas: Vos: Vog) and the resulting drain current Ins: A numerical 


minimization algorithm is then used to perturb the parameter vector P until the error 
function f(P) of equation (4.2.2) is minimized. For each measurement, the parameters 
P and the measurement conditions are input to the empirical MOS modeling routine, 


giving a predicted drain current 'psp° Equation (4.2.4) is used to find the error for 


that measurement, which is summed with the others to form f(P). 
This program organization is rather independent of the specific model equations; 


new models may be extracted merely by substituting a new empirical modeling routine. 
4.3: Compilation 


The program of Figure 4.3 can be fed input measurements which are in fact the 
results of a simulation; a model compilation is performed exactly this way. Parameters 
for an empirical model are then generated automatically, using input parameters for a 
theoretical model to drive the simulation. 

To examine the model accuracy obtainable with this approach, five parameter 
extraction programs were constructed and tested. The only differences among the 


programs were the numerical minimization algorithms used. Extractions were performed 
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Figure 4.3 Program organization 


with Hooke & Jeeves, Simplex, Conjugate Gradient, Fletcher, and Marquardt 
procedures. 

Parameter "compilations", as described in chapter 1, were performed: a set of 
parameters for the Berkeley model was supplied, and the extraction programs 
computed a corresponding set of Mosaid model parameters. The SPICE 2G.5 program 
was used to simulate measurements of test transistors (using the Berkeley model and 
the given parameters), and these measurements were fed to the extraction programs. 
Extraction algorithms minimized the error function (4.2.3), attempting to generate 
Mosaid model parameters which would predict exactly the same current-voltage 
behavior as the Berkeley parameters did. 

The simulated "measurements" must be carefully chosen to represent the whole 
range of anticipated device operation. {If no data is supplied for a particular region of 
operation, the resulting empirical model is forced to extrapolate to predict behavior in 


that region, resulting in higher modeling error. Short, long, wide, and narrow channel 
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devices should be included in the measurements, operating in cutoff, triode, and 
saturation. Both low and high-current measurements should be taken, and back-bias 


effects (Vopg >0) should be included. Table 4.4 shows the simulated measurements 
used in the compilation experiments. Six values of Vos» five values of Ves: four 


values of W/L, and two values of Vopg were simulated, for a total of 240 data points. 


Variable 


(W/L) | (100/100) (100/4) (4/100) (4/4) 


Vog 0 3 
Ve S 0.5 1 1.5 3 5 
Vos 0.05 1 2 3 5 6 


Table 4.4 Data points for parameter compilation 


The parameters used for the Berkeley theoretical model are shown in Table 4.5. 
They represent a typical contemporary n-channel MOS process, approximately of the 
same dimensions as "HMOS-1" [36]. 

A set of normalization constants were used to keep the elements of the Mosaid 
parameter vector approximately at the same order of magnitude (near unity); these are 
shown in Table 4.6. The starting values of the normalized parameters were all set to 
1.0. 

Explicit weighting was used to emphasize measurements at small currents; this 


boosts model accuracy near "Wr" and improves circuit simulations of inverter trip- 


| Parameter | Value 


0.633 | Volts 


= = 
UCRIT 
UEXP 
VMAX 


NEFF 


Table 4.5 Berkeley parameter values 


oO 
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points. The weighting constants are 


w,=40 if Imegg (i) S 


Wi = 1.0 otherwise 


ooo 


ae 


-2 Meters2/Volt-Second 
Volts/Meter 
Meters/Second 


W. 
< aoe —L 


(4.3.1) 


The choice of a "current floor" / in for equation (4.2.4) represents a tradeoff; low 


values of A oii give high accuracy at low currents with diminished accuracy at high 


currents. Large values of Lain have the opposite effect. Experiments were performed 


Table 4.6 Mosaid parameter normalization coefficients 


to find a value of baa that gives a reasonable balance between errors at low and high 


currents. 


Table 4.7 shows the total modeling error f(P) and RMS percentage’ error 
[100 ( f(P)/240 7 2 ] at four different values of Peat The error values cited in this 
table are the errors produced by Fletcher’s minimization algorithm. As / min increases, 


total error decreases, since the error contributions of low-current measurements are 
diminished. 
Figures 4.8 - 4.11 show the model fit, from parameters extracted by Fletcher’s 


algorithm, as a function of / min’ The I-V curves of a (100/4) transistor are plotted, 


simulated with the input theoretical model (solid curve) and with the resulting empirical 
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200 0.1280 


Table 4.7 Effect of lain OM modeling error 


model (dotted curve). One set of curves is given for low currents (Ves =0, 0.5, 1.0, 
.., 2.5) and one set is plotted for high currents (Ves =0, 1, 2, ..., 5). In both cases 
Vos ranges from 0 to 6 volts. 

Comparing Figures 4.8 and 4.9, it appears that the fit is not dramatically altered by 


raising / from 2 to 20 microamps. However, as / fri is increased to 60 and 200 


min n 


microamps (Figures 4.10 and 4.11), the curves begin to move and accuracy is improved 


at high currents. Eventually low-current accuracy is sacrificed, and increasing Lae 
worsens the problem. Based on these observations, a value of /, = 60 yA was 


selected to be incorporated into the final parameter compilation procedure. 


Ids 


Figure 4.8a 


Model Fit at Low Current, / 


min 


2 microamps. 
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Figure 4.8b 


Model Fit at High Current, / 


min 


= 2 microamps. 


Ids 


Figure 4.9a 
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Model Fit at Low Current, / 


min 


= 20 microamps. 
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Figure 4.95 Model Fit at High Current, / = 20 microamps. 


min 


Vds 
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Figure 4.10a Model Fit at Low Current, ae = 60 microamps. 
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Figure 4.10b 


Model Fit at High Current, / 


min 


= 60 microamps. 
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Figure 4.11a 


Model Fit at Low Current, / 


min 


= 200 microamps. 
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Figure 4.11b 


Model Fit at High Current, eae 


200 microamps. 


Vds 
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These parameter extractions were performed using a sum-of-squares measure of 
modeling error. Sum-of-squares tends to optimize the average case, because the error 
of each data point is summed into the final measurement. A different style of error 


measure can be defined which optimizes the worst case; it is called "minimax": 


g(P) = max le | (4.3.2) 


Note that the function g(P) in the minimax formulation is not differentiable; quasi- 
Newton algorithms therefore cannot minimize it directly. Although techniques exist for 
transforming a minimax problem into a sequence of differentiable functions [5], it is 
simpler to exploit a minimization algorithm which does not require differentiability. 

Far away from the minimax solution, it is likely that g(P) = 1, because some data 
=0 (see equation 4.2.4). 


point will give / 


meas red 


> 0 while the model predicts / Ps 
Small perturbations of the parameter vector P will not change g(P), and the 
minimization algorithm will terminate because the local gradient is zero. The minimax 
formulation therefore requires a starting point which is close to the final solution; in 
particular g(P) must be less than 1. A useful heuristic which often satisfies this 
requirement is to start the minimax algorithm with the final parameter vector produced 
by a sum-of-squares procedure; this effectively adds the execution times of the two 
programs. 

A version of the Hooke & Jeeves algorithm was coded which used (4.3.2) as the 
error function. The value of p was increased to 0.8, so the steplength did not shrink 
so quickly between iterations. This prevents the algorithm from overlooking a 
promising search direction. Modeling results using this minimax technique are shown 
in Figure 4.12. Comparing these with Figure 4.10, the overall curve-fits obtained by 
minimax optimization appear to be inferior to sum-of-squares results. Since there is no 


penalty (in terms of g) for increasing the error at a point which is not the worst-case, 


the minimax algorithm tolerates rather large errors at all points. 


Ids 


Figure 4.12a Model Fit at Low Current, in = 60 microamps. 


(Minimax error function) 


Ids 


9G 


Figure 4.12b 


Model Fit at High Current, ee 


(Minimax error function) 


= 60 microamps. 


ay ee 


4.4: Algorithm Performance 


Five extraction programs using sum-of-squares minimization, and one program 


using minimax, were monitored for three values of floor current: ae = 20, 60, and 200 


pA. Table 4.13 shows the performance of each algorithm when compiling the example 
model from the previous section. All experiments were performed on a VAX2 11/780 
computer running release 4.1 of the V7 UNIX? operating system. Programs were coded 
in the C language. 

The third column of Table 4.13 indicates how many times the error function f(P) 
(equation 4.2.3) was evaluated during the minimization, while the fourth column gives 
the total run time in user CPU-seconds. The last column shows the value of f(P) 
returned when the algorithm terminated. Function evaluations and CPU times given for 
the Hooke & Jeeves minimax algorithm include the first sum-of-squares phase. Note 
also that g(P) values from the minimax algorithm are not directly comparable to f(P) 
values produced by the other algorithms. 

Parameter extraction time is related to the complexity of the model; the longer it 
takes to evaluate f(P), the longer it will take to extract the model parameters. If f(P) 
is sufficiently expensive to compute, the computational overhead of the minimization 
algorithm becomes negligible, and execution speed is determined solely by the 
evaluations of f(P). Therefore the number of computations of f(P) should be 
considered along with the CPU time when deciding the relative merits of extraction 
algorithms. 


The Marquardt algorithm is designed to handle only sum-of-squares problems, yet 


av Ax is a registered trademark of Digital Equipment Corporation. 
UNIX is a registered trademark of Bell Telephone Laboratories. 
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Table 4.13 Parameter compilation speed of minimization algorithms 


it is not superior to Fletcher’s method except in the case /_,., = 200 pA. Recent data 


indicates that if the objective function is highly nonlinear, and the final error is 


nonzero, then special sum-of-squares algorithms are no better than general minimizers 


-79- 


[34]. 

Hooke & Jeeves is the generally the fastest algorithm, but it yields function values 
which are not minimal. If, however, the model is extremely expensive to evaluate, 
Hooke & Jeeves can be used to quickly locate a parameter-vector near the minimum. 
Then a slower, more accurate procedure (such as Marquardt’s) could seek out the final 
optimum. 

Fletcher's variable-metric algorithm consistently produces the lowest value of f(P), 
while maintaining a relatively small number of function evaluations and a rapid 
execution speed. It is therefore chosen as the single minimization procedure to be 
incorporated into the final parameter compiler. 

Table 4.14 gives the (unnormalized) values of the Mosaid parameters produced by 


Fletcher's algorithm for each value of | nin’ 


4.5: Simulation Speed 


To asses the benefits derived from using an empirical model, simulations were 
performed on two identical circuits. One was simulated using the Berkeley model, and 
the other was simulated using the compiled Mosaid model of the previous section. The 
simulations were identical in every other respect, so the difference in measured 
execution speed was caused by the difference in the models. 

This measurement was repeated for fourteen circuits, each having a different 
number of MOS transistors, to monitor the effect of circuit size on relative simulation 
speed. The circuits used are shown in Figure 4.15; they consist of an iterated parallel 
connection of 5-transistor modules. A parallel topology was chosen to guarantee that 
the simulator would evaluate every transistor at every timestep. Other topologies (such 
as a series connection) might contain nodes which stay at a constant value for many 
timesteps, allowing a simulator to bypass the evaluation of transistors connected to 


those nodes [3]. This would tend to invalidate size-versus-speed measurements, since 


Table 4.14 Compiled values of Mosaid model parameters 


the number of simulator evaluations is no longer proportional to circuit size. 

If all of the iterated modules in the parallel topology were identical, the waveforms 
at the internal nodes would be identical. A (very eiphistiGaied) static pre-analysis 
could notice this fact, and bypass the majority of the computation by simulating only 
one module. The circuit of Figure 4.15 avoids this problem by making transistor M1 a 
different size in each module. Channel length was selected by generating a random 
number in the range (np <L < 2p), and channel width was chosen from 
(20n < W < 30p). 


The results of these experiments are shown in Tables 4.16 and 4.17; they are also 
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Figure 4.15 Variable-size circuit for speed experiments 


plotted (in normalized form) as Figure 1.1. Measured SPICE execution time for a circuit 


of n transistors was fitted to the equation 


Time = C+ Kné 


Sum-of-squares fitting was used to find optimum values of C, K, and —; the worst-case 


error at any single point was 6%. The computed optimum values are 


Model Cc K E 


Mosaid 


Berkeley | 2.455 | 5.125 | 1.042 


The “average speedup" achieved by using the Mosaid model instead of the Berkeley 


model is K Mosaid / K Berkeley = 0.6435. That is, SPICE simulations using the Mosaid 


model will take 65% as long as the same simulations using the Berkeley model. 
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Table 4.16 Simulation speed using Mosaid model 


Table 4.17 Simulation speed using Berkeley model 


Chapter 5: Conclusions and Directions for Further Research 


The principal goal of this effort has been to demonstrate the practicality of a two- 
level modeling scheme. An automatic parameter extraction program was exhibited in 
chapter 4, proving the feasibility of the proposed translation mechanism. Parameters 
for the Berkeley theoretical model were successtully compiled into parameters for the 
Mosaid empirical model. Errors introduced by the second level of modeling were smail 
(<5%), which suggests that simulation accuracy will not. be appreciably degraded. 

Several numerical minimization algorithms were tested, and Fietcher’s switching- 
policy variable metric algorithm was found best for extracting Mosaid model parameters. 
The Marquardt algorithm did not offer substantial improvements in speed or accuracy, 
even though it is specifically designed to operate on parameter-extraction problems. A 
"current floor" of 60 microamperes was found to provide a good tradeoff between 
accuracy at low currents and accuracy at high currents. 

The effect of model complexity on total simulation time was measured, and it was 
found that simple empirical models provide substantial improvements over theoretical 
models. Speedups approaching 35 percent were observed, indicating that the two-level 
modeling technique will allow significant increases in simulation productivity. Much 
additional investigation remains to be done; this chapter outlines several areas in which 


the two-level modeling idea can be extended and improved. 
5.1: Highly Nonlinear Models 


The Mosaid model used in the compilation experiments is characterized.by a small 
set of equations, in which each model parameter is a coefficient for a simple 
polynomial. Predicted currents are gently varying functions of parameter values, so — 
minimization procedures which operate on first derivatives will give fast convergence. 


If, however, a model with a strong nonlinear relationship between parameter values and 


predicted currents is chosen, these algorithms will converge much more stowly [25]. 
This suggests a combined approach, using a function-comparison method to quickly 
find a point near the optimum (where the error in the Newton approximation is small). 


Then a quasi-Newton procedure could be used for the final convergence stage. 


5.2: Nearly Redundant Parameters 


Model equations may be defined such that some parameters are redundant or 


nearly redundant. For example, the equation / = K 4K oV is a poor model of a resistor 
because Ky is redundant: there are an infinite number of (K 1 Ko) pairs which predict 


the same current-voltage behavior. 
The Berkeley model suffers from this problem, because device gain is set by the 


ratio of two input parameters: 


This apparent redundancy can send some minimization algorithms into slightly-damped 


oscillation, repeatedly varying Ho and Le until a stable solution is stumbled upon. The 


two parameters are not completely redundant, but the differences between them are 


only manifested in second order effects (such as the body-effect coefficient Yg)- 
Parameter extraction is therefore extremely slow, because the distinction between Bo 
and ly is only apparent near the optimum point. 


Preliminary extraction experiments with the Berkeley model tend to indicate that 
simple function-comparison procedures (such as the Simplex algorithm) converge 
sooner than quasi-Newton routines, particularly if the starting point is far away from the 


optimum. Other extraction efforts on the Berkeley model [46] have ignored the 


redundant-parameter problem, preferring to fix one of the two parameters as a 


constant, and using a quasi-Newton algorithm to extract the other. 
5.3: Incremental Extraction 


To adequately represent device behavior in all important regions of operation, the 
parameter extraction algorithm requires many input measurements. For each 
measurement, a model prediction must be calculated, so the time required to perform 
an extraction is proportional to the number of measurements supplied. Unfortunately, 
there seems to be a tradeoff between extraction time and goodness of fit. 

This difficulty may lend itself to a divide-and-conquer approach: an extraction is 
performed on a small subset of the data, producing an optimal point in the parameter 
space for that subset. Then a full extraction is run on the entire set of measurements, 
using this solution as a starting point. The improved starting point will allow the full 
extraction to use fewer iterations to compute the optimum parameter values. If several 
iterations of the full extraction can be eliminated, the net savings in extraction time will 


be substantial. 
5.4: Multiple Models 


Circuit simulators have traditionally implemented a single model, with a single set 
of parameters, to represent all transistors that could possibly be built with a given 
fabrication technology. This restriction could be relaxed, allowing model parameters to 
be optimized for a particular subset of all possible transistors. An obvious example 
would be to extract four sets of parameters, separately modeling the four different 


size-classes of transistors: 
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Width Class 
> Op Wide, Long 
> Op Wide, Short 


Narrow, Long 


Narrow, Short 


Model parameters could be separately optimized within each class, which permits 
better curve-fits and lower total model error. This approach would especially improve 
modeling of narrow, short devices, because interactions between narrow and _ short 
channel effects would be explicitly accounted for. A simple preprocessing stage in the 
circuit simulator would assign each transistor in the simulated circuit to one of these 


classes, so the proper set of parameters is always used. 
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